热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

如何使用xpath&lxml获取节点的全部内容?-howtogetthefullcontentsofanodeusingxpath&lxml?

Iamusinglxmlsxpathfunctiontoretrievepartsofawebpage.Iamtryingtogetcontentsofa&l

I am using lxml's xpath function to retrieve parts of a webpage. I am trying to get contents of a tag, which includes html tags of its own. If I use

我正在使用lxml的xpath函数来检索网页的各个部分。我试图获取标签的内容,其中包含自己的html标签。如果我使用

//td[@valign="top"]/p[1]/font[@face="verdana" and @color="#ffffff" and @size="2"]

I get the right amount of nodes, but they are returned as lxml objects ().

我获得了正确数量的节点,但它们作为lxml对象返回( <元素字体位于0x101fe5eb0> )。

If I use

如果我使用

//td[@valign="top"]/p[1]/font[@face="verdana" and @color="#ffffff" and @size="2"]/text()

I get exactly what I want, except that I don't get any of the HTML code which is contained within the nodes.

我得到了我想要的,除了我没有得到节点中包含的任何HTML代码。

If I use

如果我使用

//td[@valign="top"]/p[1]/font[@face="verdana" and @color="#ffffff" and @size="2"]/node()

if get a mixture of text and lxml elements! (e.g. something something something)

如果得到文本和lxml元素的混合! (例如某事 <元素a在0x102ac2140> 某事)

Is there anyway to use a pure XPath query to get the contents of the nodes, or even to force lxml to return a string of the contents from the .xpath() method, rather than an lxml object?

无论如何使用纯XPath查询来获取节点的内容,甚至强制lxml从.xpath()方法返回内容的字符串,而不是lxml对象?

Note that I'm returning a list of many nodes from the XPath query so the solution needs to support that.

请注意,我正在从XPath查询返回许多节点的列表,因此解决方案需要支持该节点。

just to clarify... i want to return something something inside something from something like...

只是为了澄清...我想要回复一些东西里面的东西......

inside something

2 个解决方案

#1


2  

I'm not sure I understand -- is this close to what you are looking for?

我不确定我理解 - 这是否接近你想要的?

import lxml.etree as le
import cStringIO
cOntent='''\
inside something
'''
doc=le.parse(cStringIO.StringIO(content))

xpath='//font[@face="verdana" and @color="#ffffff" and @size="2"]/child::*'
x=doc.xpath(xpath)
print(map(le.tostring,x))
# ['inside something']

#2


2  

Is there anyway to use a pure XPath query to get the contents of the nodes, or even to force lxml to return a string of the contents from the .xpath() method, rather than an lxml object?

无论如何使用纯XPath查询来获取节点的内容,甚至强制lxml从.xpath()方法返回内容的字符串,而不是lxml对象?

Note that I'm returning a list of many nodes from the XPath query so the solution needs to support that.

请注意,我正在从XPath查询返回许多节点的列表,因此解决方案需要支持该节点。

just to clarify... i want to return something something inside something from something like...

只是为了澄清...我想要回复一些东西里面的东西......

href="url">inside something

Short answer: No.

简答:没有。

XPath doesn't work on "tags" but with nodes

XPath不适用于“标签”,但适用于节点

The selected nodes are represented as instances of specific objects in the language that is hosting XPath.

所选节点表示为托管XPath的语言中的特定对象的实例。

In case you need the string representation of a particular node's markup, such objects typically support an outerXML property -- check the documentation of the hosting language (lxml in this case).

如果您需要特定节点标记的字符串表示,此类对象通常支持outerXML属性 - 请检查托管语言的文档(在本例中为lxml)。

As @Robert-Rossney pointed out in his comment: lxml's tostring() method is equivalent to other environments' outerXml property.

正如@ Robert-Rossney在他的评论中指出的那样:lxml的tostring()方法等同于其他环境的outerXml属性。


推荐阅读
author-avatar
lifetime8_797
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有